Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(quotas): Enforce rate limits on metrics buckets [INGEST-1654] #1515

Merged
merged 40 commits into from
Oct 17, 2022

Conversation

jjbayer
Copy link
Member

@jjbayer jjbayer commented Oct 5, 2022

Apply rate limits to metrics buckets from the transactions namespace.

Background

Dynamic Sampling introduces a new type of quota, transactions_processed, which defines how many transactions should be "processed", i.e. whether or not we should extract metrics from them. This quota is always >= the transactions quota, which defines how many transactions should be stored & indexed.

The new data category should not only apply to incoming transaction payloads (see #1517), but also to metrics buckets, which may have been extracted by an upstream Relay.

Moreover, processing Relays must count the number of transactions for which metrics have been dropped, and create outcomes for them, in the same way that we create accepted outcomes (see getsentry/sentry#39236).

Data Flow

Processing Relays check rate limits against Redis. Currently the only actor that has access to this rate limiter is the EnvelopeProcessor, and we want to make use of its thread pool to make the blocking calls to Redis. This makes it necessary to redirect metrics buckets to the processor before sending them to the EnvelopeManager for publishing:

Old Flow (still applies under certain conditions)

 ┌──────────┐    ┌────────────┐ ┌───────────────┐
 │Aggregator│    │ProjectCache│ │EnvelopeManager│
 └────┬─────┘    └─────┬──────┘ └──────┬────────┘
      │                │               │
      │  FlushBuckets  │               │
      ├───────────────►│               │
      │                │  SendMetrics  │
      │                ├──────────────►│
      │                │               │

New Flow

 ┌──────────┐  ┌────────────┐       ┌─────────────────┐ ┌─────┐ ┌───────────────┐
 │Aggregator│  │ProjectCache│       │EnvelopeProcessor│ │Redis│ │EnvelopeManager│
 └────┬─────┘  └─────┬──────┘       └────────┬────────┘ └──┬──┘ └──────┬────────┘
      │              │                       │             │           │
      │ FlushBuckets │                       │             │           │
      ├─────────────►│                       │             │           │
      │              │ RateLimitFlushBuckets │             │           │
      │              ├──────────────────────►│             │           │
      │              │                       ├────────────►│           │
      │              │                       │             │           │
      │              │                       │◄────────────┤           │
      │              │                       │             │           │
      │              │                       │             │           │
      │              │                       │       SendMetrics       │
      │              │                       ├─────────────┬──────────►│
      │              │                       │             │           │

Business Logic

  1. ProjectCache received a FlushBuckets from the metrics aggregator.
  2. Count the number of transactions that contributed to these buckets (see below).
  3. Check if cached rate limits apply.
  4. If so, drop transaction-related buckets, generate outcomes, and send the buckets to EnvelopeManager.
  5. Otherwise, if processing mode is enabled, send buckets to the EnvelopeProcessor.
  6. The processor communicates the transaction count to redis and applies the rate limit if reached.
  7. The processor sends a message to the project cache to update the cached rate limits.
  8. Drop transaction-related buckets, generate outcomes, and send the buckets to EnvelopeManager.

Counting processed transactions

  • For d:transactions/duration@millisecond, increment the counter in redis by the length of the bucket's value. This is an accurate count of the number of transactions that contributed to the bucket.
  • For any other metric, do not increment the counter in redis, but still enforce the rate limit if the quota is currently exceeded.

Not in this PR

  • Enforcement of rate limits on the fast path.
  • Enforcements of rate limits on metrics buckets in the sessions namespace.

.

jan-auer added a commit that referenced this pull request Oct 7, 2022
For #1515, it is required to check for a required quota of another data
category without incrementing it. This PR updates the Redis LUA script
to support a rate limiting quantity of `0`, which checks for existing
rate limits without incrementing internal counters.

The rate limiter gains a new explicit branch to check whether the
quantity is `0`. 

Co-authored-by: Jan Michael Auer <mail@jauer.org>
* master:
  fix(quotas): Make redis rate limiter work with quantity 0 (#1519)
  ref: Remove unused rate_limits from ProcessEnvelopeState (#1516)
  fix(quotas): Use correct string spelling for TransactionProcessed (#1514)
@jjbayer jjbayer self-assigned this Oct 10, 2022
@jjbayer jjbayer marked this pull request as ready for review October 12, 2022 15:17
@jjbayer jjbayer requested review from a team, olksdr and jan-auer October 12, 2022 15:17
@jan-auer jan-auer changed the title Enforce rate limits on metrics buckets [INGEST-1654] feat(quotas): Enforce rate limits on metrics buckets [INGEST-1654] Oct 13, 2022
match rate_limits {
Ok(rate_limits) => {
// If a rate limit is active, discard transaction buckets.
if let Some(limit) = rate_limits.limit_for(DataCategory::TransactionProcessed) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it even necessary to filter by data category here, given the item_scoping? Should we use rate_limits.longest() instead?

@@ -601,6 +606,7 @@ impl Project {
///
/// The buckets will be keyed underneath this project key.
pub fn merge_buckets(&mut self, buckets: Vec<Bucket>) {
// TODO: rate limits
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will add a call to BucketLimiter::enforce_limits() to metrics_allowed() in a follow-up PR.

@jjbayer jjbayer requested a review from olksdr October 17, 2022 07:47
Copy link
Contributor

@olksdr olksdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only one question/suggestion, otherwise lgtm!

// outcome is generated.
//
// Returns true if any buckets were dropped.
pub fn enforce_limits(&mut self, rate_limits: Result<&RateLimits, ()>) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To accept the result into the function seems a little bit strange and we kind of ignore the Err anyway.
Could we just accept Option<&RateLimits> instead? In my opinion it will make the interface a little bit cleaner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, given that the Err variant is empty anyway. But I feel that using a Result instead of an Option makes it clear to the reader that this is an error case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I look into this, first comes to my mind that all the errors should have been handled before calling this function, and it should get here only if we are ok with enforcing the limits (whatever it means in this case).

If we have Option, in case of None it says to me - there is no limits to enforce, just emit outcome, otherwise enforce what you can 😄 once there is Some(_)

@jjbayer jjbayer merged commit 4ffe338 into master Oct 17, 2022
@jjbayer jjbayer deleted the feat/metrics-enforce-quotas branch October 17, 2022 11:57
jjbayer added a commit that referenced this pull request Oct 24, 2022
### Background

In #1515 we implemented rate
limiting functionality for metrics buckets, which are applied after
flushing them from the metrics aggregator by checking redis.

### This PR

When a quota is already exhausted, and that information has already been
cached on the project state, there is no need to go through the
aggregator at all. Instead, we apply the rate limiting logic _before_
sending metrics or buckets into the aggregator.

### Implementation details

The utility type `BucketLimiter` was converted to a generic
`MetricsLimiter<T>`, where `T` can be either `Metric` or `Bucket`.

### Notes

Will be merged into #1537 &
deployed together.
jan-auer added a commit that referenced this pull request Oct 25, 2022
* master:
  release: 0.8.15
  fix(py): Respect the renormalize flag (#1548)
  (fix)e2e: Use report self hosted issues env variable (#1539)
  meta(vscode): Enable all features in Rust-Analyzer (#1542)
  release: 0.8.14
  build(craft): Fix manylinux artifact name (#1547)
  feat(quotas): New data category for indexed transactions (#1535)
  test(auth): Unflake re_auth_failure (#1531)
  replays: add warning log for parse errors (#1534)
  fix(server): Retain valid cached project states on error (#1426)
  feat(protocol): Implement response context schema (#1529)
  feat(replays): emit org_id on recording kafka messages (#1528)
  feat: Add .NET/Portable-PDB specific protocol fields (#1518)
  feat(quotas): Enforce rate limits on metrics buckets (#1515)
  ref(pii): Consider all token as sensitive [INGEST-1550] (#1527)
  release: 22.10.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants